This is the Optimization Demonstration for MARS

Approach

Uncontrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Find the minimal combination of algorithm inputs that maximize accuracy. If there are ties, break them by using the point that requires the least data.
  3. Find the costs associated with running the algorithm with those inputs on all different hardware configurations.
  4. Find the combination of hardware that jointly minimizes cost and time.

Data-contrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Subset the accuracy surface produced above to the amount of data available. The maximizing point will fall in the upper right corner of the subsetted space.
  3. Find the costs associated with running the algorithms with the accuracy-maximizing point on all different hardwares.
  4. Find the combination of hardware that jointly minimizes time and cost.

Cost-constrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Subset the space produced above to the amount of time and money able to be spent on modelling.
  3. Working backwards now, find the accuracies that can be produced in the limited time.
  4. Using the subset of accuracy space, find the combination of algorithm inputs that maximizes accuracy.
knitr::opts_chunk$set(cache=TRUE, echo=F, warning=F, error = F, message=F)
knitr::opts_knit$set(root.dir = "/users/scottsfarley/documents")
setwd("/users/scottsfarley/documents")
library(parallel)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
library(akima)
library(ggplot2)
options(java.parameters = "-Xmx1500m")
library(bartMachine)
## Loading required package: rJava
## Loading required package: bartMachineJARs
## Loading required package: car
## Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
## loading 'pbkrtest'
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## Loading required package: missForest
## Loading required package: itertools
## Welcome to bartMachine v1.2.3! You have 1.4GB memory available.
## 
## If you run out of memory, restart R, and use e.g.
## 'options(java.parameters = "-Xmx5g")' for 5GB of RAM before you call
## 'library(bartMachine)'.
bartMachine::set_bart_machine_num_cores(3)
## bartMachine now using 3 cores.
library(reshape2)
library(ggdendro)

threshold.time <- 22 ##seconds
threshold.cost <- Inf ##cents
threshold.numTex <- 45

First, get the training data and fit the model. Perform some skill checks on it.

## bartMachine initializing with 50 trees...
## bartMachine vars checked...
## bartMachine java init...
## bartMachine factors created...
## bartMachine before preprocess...
## bartMachine after preprocess... 6 total features...
## bartMachine sigsq estimated...
## bartMachine training data finalized...
## Now building bartMachine for regression ...
## serializing in order to be saved for future R sessions...done
## bartMachine initializing with 50 trees...
## bartMachine vars checked...
## bartMachine java init...
## bartMachine factors created...
## bartMachine before preprocess...
## bartMachine after preprocess... 6 total features...
## bartMachine sigsq estimated...
## bartMachine training data finalized...
## Now building bartMachine for regression ...
## serializing in order to be saved for future R sessions...done

Choose a finite number of possible solutions to the model. Ideally, we would want every single combination of predictor variables [0, Inf]. This is obviously intractable. Moreover, I only have data for a subset of that space anyways. So randomly sample the subspace in which I have data to make the problem possible to solve.

Using that subset of data and the models we fit previously, predict each candidate configuration of algorithm inputs and hardware variables for execution time and SDM accuracy.

Plot the posterior means of the accuracy models against the algorithm inputs that should control accuracy. In this case, these are number of training examples and number of covariates.

The accuracy clearly varies from low (few training examples and few covariates) to very high (many covariates, many training examples). Perhaps more data would be helpful here, but what are you going to do. Our task is to find the combinations of inputs that results in the highest accuracy model. If there’s a tie, find the combination that needs the least data.

Now, we know the combination of algorithm inputs that result in the highest accuracy. The figure below shows the combination identified on the training examples and covariates axes. This combination of training examples and number of covariates can be run on any combination of hardware. Some might be suboptimal. Thus, at this point, we’ve solved half of our challenge: algorithm inputs have been optimized, now it’s time optimize hardware.

## [1] "Accuracy is maximized at 1000 training examples and 5 predictors."

In theory, the hardware parameters should not affect the SDM accuracy. We can test this assumption here, by plotting the accuracies obtained for this combination of algorithm inputs against modeled accuracy on the number of CPUs and amount of memory. If the assumption is valid, the plot should show no change in either the horizontal or vertical directions. We see that there is, in fact, some change, though. This is likely due to expeirmental design, and lack of a full factorial design setup. The effect is realtively minor, and I choose to comment it and move along.

## [1] "Accuracy Range on Hardware:  0.0411928291213695"
## [1] "Accuracy Range from Expectation:  0"
## [1] "------"
## [1] "Fixing accuracy at:  0.771872398925075"

Now, fix the algorithm inputs at the accuracy-maximizing point– effectively fixing expected model accuracy. An algorithm with these inputs can be run on any combination of hardware. Project how long that specific model would take and how much it would cost on all computing types. Plot those out on time vs. cost axes.

The optimal solution is the one that balances time and cost equally during the minimization. We use euclidean distance here, which normalizes each dimension by its standard deviation, so they are weighted equally. For each candidate combiantion of hardware, we calculate the distance between it and the origin of these two axes. We then find the minimum of that distance matrix and call that point the optimal.

Our job is complete. We’ve now optimized both the harware and software dimensions of the problem.

## [1] "------MARS OPTIMAL--------"
## [1] "Predicted Optimal Accuracy 0.771872398925075 +/- 0"
## [1] "Predicted Optimal Cost (seconds) 6.83696138039275"
## [1] "Predicted Optimal Cost (cents) 8.1522560107527"
## [1] "Cores:  7"
## [1] "Memory: 16"
## [1] "Training Examples: 1000"
## [1] "Covariates: 5"

Everything up to this point was done using the mean of the posterior distribution, a choice which simplifies the process but causes some information loss and may cause over-confidence in the predictions. We can modify our steps to include information from the entire posterior, which may solve this issue.

Instead of projecting just the mean time and mean cost for use the the distance minimization, use the entire set of posterior samples. Calculate the distance metric for each sample in the posterior independently. You’re then left with a density distribution of distances, from which we can infer the minimum value.

The posteriors are in a line, since there’s a fixed linear relationship between time and cost.

Now, find the distance metrics for all of those points.

There’s a lot of overlab in this figure, and many points are far away from the optimal. We don’t care about those. Take the few closest to the minimum and look at their distributions.

Now, the optimal configuration may be one of the following:

config cores GBMemory seconds cost distance.mean distance.sd
81 7 16 6.836961 8.152256 10.75961 1.763618
93 8 16 6.836961 9.316864 11.68653 1.915550
105 9 16 6.836961 10.481472 12.65523 2.074331
117 10 16 6.836961 11.646080 13.65683 2.238504
129 11 16 6.836961 12.810688 14.68458 2.406965
141 12 16 6.836961 13.975296 15.73338 2.578874
153 13 16 6.836961 15.139904 16.79927 2.753585
165 14 16 6.836961 16.304512 17.87921 2.930598
21 2 16 15.624480 5.322948 18.53887 8.115716
177 15 16 6.836961 17.469120 18.97078 3.109520
57 5 16 13.744239 11.705969 19.51761 6.828874
33 3 16 15.921746 8.136330 19.99556 8.507027
189 16 16 6.836961 18.633728 20.07210 3.290038
200 17 14 94.171586 240.617819 21.18164 3.471904
69 6 16 13.744239 14.047162 21.24629 7.433711
45 4 16 15.921746 10.848441 21.54564 9.166504
212 18 14 94.171586 254.771808 22.29818 3.654917
224 19 14 94.171586 268.925797 23.42071 3.838912
236 20 14 94.171586 283.079787 24.54842 4.023756
248 21 14 94.171586 297.233776 25.68062 4.209335
260 22 14 94.171586 311.387765 26.81674 4.395558
272 23 14 94.171586 325.541755 27.95630 4.582345
284 24 14 94.171586 339.695744 29.09891 4.769630
9 1 16 29.558919 5.035066 43.45833 26.690439
73 7 2 68.447920 14.402811 76.40071 29.464528

config cores GBMemory seconds cost distance.mean distance.sd cluster
81 7 16 6.836961 8.152256 10.75961 1.763618 1
93 8 16 6.836961 9.316864 11.68653 1.915550 1
105 9 16 6.836961 10.481472 12.65523 2.074331 1
117 10 16 6.836961 11.646080 13.65683 2.238504 1
129 11 16 6.836961 12.810688 14.68458 2.406965 1
141 12 16 6.836961 13.975296 15.73338 2.578874 1
153 13 16 6.836961 15.139904 16.79927 2.753585 1
165 14 16 6.836961 16.304512 17.87921 2.930598 1
21 2 16 15.624480 5.322948 18.53887 8.115716 1
177 15 16 6.836961 17.469120 18.97078 3.109520 1
57 5 16 13.744239 11.705969 19.51761 6.828874 1
33 3 16 15.921746 8.136330 19.99556 8.507027 1
189 16 16 6.836961 18.633728 20.07210 3.290038 1
69 6 16 13.744239 14.047162 21.24629 7.433711 1
45 4 16 15.921746 10.848441 21.54564 9.166504 1

In the results above, you’re accutally seeing the trade off between time and money play out quite nicely. Adding cores costs money, but, in the case of MARSs, reduces time. Here, that tradeoff basically exactly evens out.

Data Constraint

In this case, we’ve got a constraint on the amount of data available to us.

## [1] "Current data threshold is  45"
## [1] "Accuracy is maximized at 44 training examples and 5 predictors."
## [1] "Expected Max Accuracy is  0.695787884494371"
## [1] "Now there are only:  287 candidates, instead of  287 candidates that can complete this scenario under budget."

## [1] "Recommended # cores:  2"
## [1] "Recommended Memory:  16"
## [1] "Expected Cost:  0.353453023150648"
## [1] "Expected Seconds:  1.03749272968959"

Finally, come back around, and find the computing hardware that’s best for these inputs.

## [1] "Now there are only:  287 candidates, instead of  287 candidates that can complete this scenario under budget."

## [1] "Recommended # cores:  2"
## [1] "Recommended Memory:  16"
## [1] "Expected Cost:  0.961444027842914"
## [1] "Expected Seconds:  2.82213228790335"

config cores GBMemory seconds cost distance.mean distance.sd
21 2 16 2.822132 0.9614440 3.199037 1.221080
33 3 16 2.881840 1.4726781 3.470269 1.332790
57 5 16 2.525749 2.1511805 3.628016 1.625571
81 7 16 2.154028 2.5684195 3.738884 1.813120
45 4 16 2.881840 1.9635708 3.739289 1.436110
69 6 16 2.525749 2.5814165 3.949350 1.769549
93 8 16 2.154028 2.9353366 4.060982 1.969317
105 9 16 2.154028 3.3022537 4.397599 2.132555
117 10 16 2.154028 3.6691708 4.745646 2.301335
129 11 16 2.154028 4.0360878 5.102784 2.474524
141 12 16 2.154028 4.4030049 5.467233 2.651259
9 1 16 5.416273 0.9226079 5.642901 1.272508
153 13 16 2.154028 4.7699220 5.837623 2.830874
165 14 16 2.154028 5.1368391 6.212892 3.012856
177 15 16 2.154028 5.5037562 6.592206 3.196799
189 16 16 2.154028 5.8706732 6.974906 3.382384
200 17 14 9.839304 25.1404054 7.360464 3.569355
212 18 14 9.839304 26.6192528 7.748453 3.757505
73 7 2 7.142564 1.5029382 7.968578 3.057241
85 8 2 7.142564 1.7176437 8.020125 3.077018
97 9 2 7.142564 1.9323492 8.078148 3.099279
224 19 14 9.839304 28.0981002 8.138525 3.946665
109 10 2 7.142564 2.1470546 8.142506 3.123971
121 11 2 7.142564 2.3617601 8.213053 3.151037
133 12 2 7.142564 2.5764655 8.289630 3.180417

config cores GBMemory seconds cost distance.mean distance.sd cluster
21 2 16 2.822132 0.961444 3.199037 1.221080 1
33 3 16 2.881840 1.472678 3.470269 1.332790 1
57 5 16 2.525749 2.151181 3.628016 1.625571 1
81 7 16 2.154028 2.568420 3.738884 1.813120 1
45 4 16 2.881840 1.963571 3.739289 1.436110 1
69 6 16 2.525749 2.581417 3.949350 1.769549 1
93 8 16 2.154028 2.935337 4.060982 1.969317 1
105 9 16 2.154028 3.302254 4.397599 2.132555 1

Cost Constraint II

config cores GBMemory seconds cost distance.mean distance.sd
9 1 16 5.530293 0.9420301 1.817675 0.4006075
2 1 4 5.746363 0.2878928 2.030041 0.3990605
21 2 16 5.306655 1.8078714 2.039653 0.4285694
33 3 16 5.306655 2.7118071 2.189242 0.4554772
3 1 5 5.530293 0.3324812 2.545638 0.4662674
4 1 6 5.530293 0.3878948 2.547293 0.4665705
8 1 14 5.530293 0.8312030 2.633182 0.5990128
5 1 8 5.530293 0.4987218 2.671534 0.5698418
6 1 10 5.530293 0.6095489 2.678156 0.5703646
7 1 12 5.530293 0.7203760 2.684524 0.5717209
10 1 18 5.530293 1.0528572 2.696252 0.8229942
11 1 20 5.530293 1.1636843 2.723566 0.8309792
12 1 22 5.530293 1.2745113 2.734846 0.8350524
49 5 2 5.818556 0.8745290 4.539993 1.5531048
61 6 2 5.818556 1.0494349 4.562004 1.5606347

config cores GBMemory seconds cost distance.mean distance.sd cluster
9 1 16 5.530293 0.9420301 1.817675 0.4006075 1